Coastal barometer blog | Coastal barometer website |
Here I will run the exploration of the response variables (aquaculture production and fisheries production) and of the covariates
library(vroom)
library(here)
library(lattice)
library(cowplot)seafood <-read.csv(here("prep", "output_data", "spfood_allvars.csv"))Calculate total sea food production here as well. I also create a log-transformed fisheries catch, because it is highly skewed to the left (see histograms below)
seafood_prep <- seafood |>
rowwise() |>
mutate(totprod = sum(aqua_prod_ton, total_catch_ton, na.rm=T)) |>
mutate(aqua_prop = aqua_prod_ton/totprod) |>
mutate(fish_catch_log = log(total_catch_ton + 1)) |>
mutate(aqua_prod_log = log(aqua_prod_ton + 1))I will use extensively the approaches and functions from the two Highland statistics ltd books: * Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer
and
Cleveland dotplots of the variables
Mydotplot(seafood_prep[,c(7:14)])We observe an excess of zeros or close-to-zero values in the population, population growth, unemployed, and total catch variables.
Check the % of zeroes in the variables:
zeros_prop <- function(x){ #x is a numerical vector
rows = sum(!is.na(x))
zeros = sum(x == 0, na.rm = T)
prop = zeros/rows
prop
}myvars <- seafood_prep[,c("aqua_prod_ton",
"total_catch_ton",
"distance",
"population",
"popgrowth",
"sea_area",
"percent_wforce",
"unemployed"
)]
map(myvars, ~zeros_prop(.x))## $aqua_prod_ton
## [1] 0.00119
##
## $total_catch_ton
## [1] 0.154
##
## $distance
## [1] 0
##
## $population
## [1] 0.00928
##
## $popgrowth
## [1] 0.0441
##
## $sea_area
## [1] 0
##
## $percent_wforce
## [1] 0.00812
##
## $unemployed
## [1] 0
Only fisheries has 15% of zeros, other variables do not suffer from zero inflation
Check out the distribution of variables:
varnames <- list("Aquaculture",
"Fisheries",
"Distance South to North",
"Population",
"Population growth",
"Sea area",
"Percent in workforce",
"Number of unemployed" )
par(mfrow = c(2,4))
hist <- map2(myvars,varnames, ~ hist(.x, main = .y, col = "seagreen", xlab = " ", ylab = "", cex.main=1.1))For the mixed models, I will use only 59 municipalities, that is, 817 out of 862 observations (that had at least 10 years of data). Let’s see if historgamrs are differenct for these observations.
mcp59 <- read_csv(here::here("prep", "output_data", "mcp_59selected.csv"))
myvars59 <- mcp59[,c("aqua_prod_ton",
"total_catch_ton",
"distance",
"population",
"popgrowth",
"sea_area",
"percent_wforce",
"unemployed")]
par(mfrow = c(2,4))
hist <- map2(myvars59,varnames, ~ hist(.x, main = .y, col = "seagreen", xlab = " ", ylab = ""))Most of the variables are left-skewed, but fisheries may need to be transformed, or later, the response variable aquaculture/(aquaculture + fisheries) may need to be transformed.
The boxplots of the variables, to check if there are outliers (formally): yes, since the variables are all left-skewed
par(mfrow = c(2,4))
map2(myvars,varnames, ~ boxplot(.x, main = .y, col = "seagreen", xlab = " "))Let’s take a look at the variable aqua_prop which is a proportion of aquaculture in the total seafood production and at the log of the fisheries catches
par(mfrow=c(1,3))
hist(seafood_prep$aqua_prop, main = "Aquaculture proportion", xlab = "", col = "orchid")
hist(seafood_prep$fish_catch_log, main = "Log of fisheries catches", xlab = "", col = "orchid")
hist(seafood_prep$aqua_prod_log, main = "Log of aquaculture production", xlab = "", col = "orchid")Mypairs(myvars59)Clear association is only seen for the population and growth, and between the number of unemployed and population growth (positive relationship in both cases).
MyX <- c("year", "sea_area", "population", "popgrowth" , "unemployed", "distance", "percent_wforce")
MyMultipanel.ggp2(Z = seafood_prep,
varx = MyX,
vary = "aqua_prop",
ylab = "aquaculture % in seafood production",
addSmoother = TRUE,
addRegressionLine = FALSE,
addHorizontalLine = FALSE) ## Warning: Removed 210 rows containing non-finite values (stat_smooth).
## Warning: Removed 210 rows containing missing values (geom_point).
No clear associations or patterns, but i need to consider transformation of a response variable.
What if aquaculture production alone would be used as a response variable? Let’s see also if fisheries catch has anything to do with aquaculture produciton
MyMultipanel.ggp2(Z = seafood_prep,
varx = MyX,
vary = "aqua_prod_ton",
ylab = "aquaculture production",
addSmoother = TRUE,
addRegressionLine = FALSE,
addHorizontalLine = FALSE) ## Warning: Removed 210 rows containing non-finite values (stat_smooth).
## Warning: Removed 210 rows containing missing values (geom_point).
This looks more interesting! Relationships are close to linear, but this result include all municipalities, it might be different per municipality.
Might be interesting to examine their associations statistically
fish_aqua <-ggplot(
data = seafood_prep,
mapping = aes(x = total_catch_ton, y = aqua_prod_ton)) +
geom_point() +
geom_smooth()
fish_aqua## Warning: Removed 25 rows containing non-finite values (stat_smooth).
## Warning: Removed 25 rows containing missing values (geom_point).
MyMultipanel.ggp2(Z = seafood_prep,
varx = MyX,
vary = "total_catch_ton",
ylab = "fisheries catch",
addSmoother = TRUE,
addRegressionLine = FALSE,
addHorizontalLine = FALSE) ## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).
I maybe will need to transform fisheries catch to log(catch) for the analysis
Just to see if the distribution of fisheries and aquaculture production over years look very different when plotted per municipality. There is a clear increase in the production over years!
ggplot(data = seafood_prep) +
geom_point(mapping = aes(x = year, y = aqua_prod_ton), size = 0.5) +
geom_smooth(mapping = aes(x = year, y = aqua_prod_ton), size = 0.4) +
facet_wrap( municip ~ .) +
theme(
axis.text.x = element_text(size = 10, angle = 90),
legend.position = "none",
panel.background = element_rect(fill = "white")) Same but for fisheries
ggplot(data = seafood_prep) +
geom_point(mapping = aes(x = year, y = total_catch_ton), size = 0.5) +
geom_smooth(mapping = aes(x = year, y = total_catch_ton), size = 0.4) +
facet_wrap( municip ~ .) +
theme(
axis.text.x = element_text(size = 10, angle = 90),
legend.position = "none",
panel.background = element_rect(fill = "white"))I am already quite sure that both responses will be temporally and spatially dependent. But to test for that formally, we can use an autocorrelation test:
seafood_temp <- arrange(seafood_prep, year)
acf(seafood_temp$aqua_prod_ton, na.action = na.pass)acf(seafood_temp$total_catch_ton)